FCEUd, Tutorial #3 (Compression in ROMs) [The Goonies 2] by Parasyte of DES (http://www.dragoneyestudios.net/) v1.0 (4-30-03) -===========--- 1) Warm-Up -===============--- Before beginning, it's best to know what is needed in order to use this document to it's full potential. First, you must know the very basics of assembly programming. (The basics are simple, understand each opcode addressing mode, and each opcode type) You must also be well versed in binary, including binary math. Overall, not much is needed to start using FCEUd. For 6502 assembly documents and tutorials, check out Zophar's Domain and ROM Hacking.com at the URLs below. As an alternative, search for "6502 Assembly" in everyone's favorite search engine - Google. Things you'll need: FCEUd 0.81.2 (get from http://www.dragoneyestudios.net/) The Goonies 2 ROM 6502 ASM docs, available from: http://www.zophar.net/ http://www.obelisk.demon.co.uk/6502/reference.html http://www.romhacking.com/ y0shi's NES Docs http://www.zophar.net/tech/files/ndox200.zip NESticle DOS x.xx (This version works under Windows XP) http://dragoneye.cg-games.net/hosted/parasyte/NESticleXP.zip -===========--- 2) About Compression -==============--- Most of us have been faced with the ordeals of compression at one time or another. The bad thing about running into compression when you're hacking is that you will be unable to edit the compressed data without decompressing it! This document will help you on your way to defeating most any compression format you come across. All you need is some assembly and programming knowledge... And of course a way to find the decompression routines in the game. For this, I typically use a debugger, such as the one available in FCEUd. Though with SNES, the best thing I've found has been SNES9x LT, with it's nice little assembly logging feature. Now, there are a few ways to deal with compressed data. The first method is just working around the compression. Leave your edited data somewhere in the ROM, uncompressed. And use some generic load routine to load your edited data after the original has already been decompressed. This method works well with ROMs with a lot of free space, or with expanded ROMs, but what about NES ROMs which are packed to the rim? NES ROM expansion is a long process that requires a lot of work, most of which is moving certain pieces of data to and from ROM banks. For the most part, that is just not something everyone can, or will do. For those people, I can only suggest writing a program to decompress the data. And another to recompress it once it's all edited. Recompressing is a very evil process, which I'll explain later, but it is very possible and not too difficult. -===========--- 3) Getting Down To Business -==============--- Now we're actually going to start doing something about this compression. And aren't we lucky? The game we're working on today uses a pretty simple compression format. Perhaps one of the easier formats I've seen. First thing you always have to do is load up the ROM, and open the FCEUd debugger. I think we'll start with the level graphics, so go ahead and get to the "Start\Continue" screen. Before going further, add a breakpoint. If you're new to FCEUd 0.81.2, you'll notice the Add Break window has more options, such as a "Memory" group box. This box let's you select which area of NES memory you're placing the breakpoint on. There's the orignal CPU memory, which all previous versions used when setting a breakpoint, and then there's also PPU and Sprite memory areas. These two are actually stored away on seperate chips on the NES hardware, so the CPU does not have direct access to those memory areas. Checking out y0shi's NES docs, (chapter 4, section B) we see that $0000 and $1000 are the addresses in PPU mem which holds the pattern tables. And if you've never used NESticle, the pattern tables contain the actual tile data used by the game. But we don't know which of the two pattern tables are used for background graphics. And until I build a pattern table viewer into FCEUd, you'll have to use NESticle to find out. So, load up NESticle, and get into the game. Open the pattern table viewer and you'll see that the right side of the pattern table viewer has all the tiles for the background, this means pattern table 1 ($1000) is used for backgrounds. So now that we know where we want to look, let's set a BPW on $1000 in PPU memory. Back in the game, push Start, and the debugger will snap just before the level is loaded. Look at the "PPU" edit box at the bottom of the debugger window. This box tells you the current PPU address which will be accessed by the $2007 register. And when you look at the disassembler, you'll see that an STA to $2007 caused the break. We are now right in the decompression routine. Not all games decompress directly to VRAM, some will decompress to RAM, then copy the decompressed data to VRAM later. Lucky for us, this game does decompress right into VRAM, so this will go by a bit quicker. The first thing you want to do when you're in the decompression routine is copy the entire thing into notepad. So scroll up in the disassembler window until you come across the first RTS opcode. I found the first RTS at $C232. So now I'll start at the next opcode, and copy all the way down to the next RTS. However, you'll reach an undefined instruction at $C2AF first. Looking above the address just a bit gives the reason for it. Instead of doing an RTS, the routine runs two branches. BEQ and BNE. These opcodes both test the status of one status flag (the Zero flag). If Zero is set, the BEQ will branch. Otherwise the BNE will branch. So then, any instructions following those branches will NEVER be executed. We'll just copy assembly ending with those. -===========--- 4) Reverse Engineering The Decompression Routine -==============--- We've got the decompression routine copied into notepad now, so the only thing left to do is reverse engineer it! Start with just a few instructions at a time. $C233:AA TAX $C234:BD 00 C0 LDA $C000,X @ $C004 = #$04 $C237:9D 00 C0 STA $C000,X @ $C004 = #$04 $C23A:9D 00 C0 STA $C000,X @ $C004 = #$04 This assembly just does some mapper-specific bank switching, which we can tell because it's storing a value to ROM. It's ignorable. $C23D:C8 INY $C23E:B1 68 LDA ($68),Y @ $C2F5 = #$04 $C240:85 6A STA $6A = #$00 $C242:C8 INY $C243:B1 68 LDA ($68),Y @ $C2F5 = #$04 $C245:85 6B STA $6B = #$80 This is a bit decieving, it looks like it's copying from $C2F5, but because Y has changed since this piece of code last executed, it's highly unlikely that $C2F5 is actually being used here. So let's reset the game, get back to the Start\Continue screen, delete any breakpoints, and set a new BPX on $C23E. After you push start, the debugger will snap, and you'll see the real address which is read from. $C23E:B1 68 LDA ($68),Y @ $C2F3 = #$00 $C240:85 6A STA $6A = #$31 Much better! So the value at $C23F is stored into $006A. Keep this in mind! Step through the assembly to see the next address read. $C243:B1 68 LDA ($68),Y @ $C2F4 = #$80 $C245:85 6B STA $6B = #$F2 So far, so good. Let's continue. $C247:A0 00 LDY #$00 $C249:AD 02 20 LDA $2002 = #$00 $C24C:B1 6A LDA ($6A),Y @ $8000 = #$10 $C24E:8D 06 20 STA $2006 = #$00 Now this is some good stuff here. at $C24C, the LDA loads from the address held in $006A! That tells us the previous code loaded a pointer from $C2F3. And in fact, the pointer loaded was $8000. So, now you know how to find pointers! The next interesting thing about this piece of code is that reading $2002 resets $2006, and $2006 is for setting up the PPU address to access through $2007. Just step through this code, and you'll see the "PPU" box at the bottom of the debugger window being changed. So far, this code has done a bank swap, then it loaded a pointer to $8000. Now it's loading the PPU address from the pointer, and setting up the PPU for writes. $C251:C8 INY $C252:B1 6A LDA ($6A),Y @ $8001 = #$00 $C254:8D 06 20 STA $2006 = #$00 Here it is loading the last portion of the PPU address from the pointer. $C257:C8 INY $C258:B1 6A LDA ($6A),Y @ $8002 = #$10 $C25A:C9 FF CMP #$FF $C25C:F0 20 BEQ $C27E $C25E:C9 80 CMP #$80 $C260:B0 2C BCS $C28E Next, it loads the following value, and compares it to $FF. If the loaded value is equal, it branches off. If not, it branches if the loaded value is greater than or equal to $80. Neither of these are true, concidering the loaded value is $10, so let's continue. $C262:85 6C STA $6C = #$DE $C264:C8 INY $C265:B1 6A LDA ($6A),Y @ $8003 = #$00 $C267:8D 07 20 STA $2007 = #$00 $C26A:C6 6C DEC $6C = #$10 $C26C:D0 F9 BNE $C267 This code stores the previously loaded value into a variable at $006C, then it runs in a loop for that number of times, copying data to PPU memory. As you step through this portion of the code, watch the PPU box. It automatically increments the address when $2007 is written to. We now know the routine is beginning decompression. The code which compared the loaded value against $FF and $80 must be for the compression control bytes. With this in mind, let's write some C to do the same thing as the above code. void Decomp(u8 *input, u8 *output) { u8 i=0,j=0,k,ctrl; ctrl=input[i++]; if (ctrl == 0xFF) goto whatever; if (ctrl <= 0x80) { for (k = 0; k < ctrl; k++) output[j++] = input[i]; } whatever: } Not so bad, is it? Do both tests, and the single-byte copy loop. Let's continue with the assembly. $C26E:98 TYA $C26F:18 CLC $C270:65 6A ADC $6A = #$00 $C272:85 6A STA $6A = #$00 $C274:A5 6B LDA $6B = #$80 $C276:69 00 ADC #$00 $C278:85 6B STA $6B = #$80 This code updates $006A, adding the number of bytes already read. Our C code already does this for us. $C27A:A0 00 LDY #$00 $C27C:F0 D9 BEQ $C257 Interesting code, it loads zero, and then branches if the result was zero. I think it's needless to say that this code will always branch. It goes back up to $C257, which loads the next control byte. Let's update the C to loop back to the control byte code. void Decomp(u8 *input, u8 *output) { u8 i=0,j=0,k,ctrl; ctrl=input[i++]; while (1) { if (ctrl == 0xFF) goto whatever; if (ctrl <= 0x80) { for (k = 0; k < ctrl; k++) output[j++] = input[i]; } whatever: } } Now go back to the assembly, and step through until the control byte code goes through one of those branches. And lucky us! the very next step through the control byte code branch us to $C28E. $C28E:29 7F AND #$7F $C290:85 6C STA $6C = #$00 Remember when $006C was used before? It was the temporary variable used in the single-byte copy loop. For that, we used the k variable. So this will be no exception. It just masks the control byte with $7F, and stores it to k. $C292:C8 INY $C293:98 TYA $C294:18 CLC $C295:65 6A ADC $6A = #$03 $C297:85 6A STA $6A = #$03 $C299:A5 6B LDA $6B = #$80 $C29B:69 00 ADC #$00 $C29D:85 6B STA $6B = #$80 Some more code to update the input pointer. Useless to us. $C29F:A0 FF LDY #$FF $C2A1:C8 INY $C2A2:B1 6A LDA ($6A),Y @ $8005 = #$7C $C2A4:8D 07 20 STA $2007 = #$00 $C2A7:C6 6C DEC $6C = #$01 $C2A9:F0 C3 BEQ $C26E $C2AB:D0 F4 BNE $C2A1 Ahh! The end of our routine. Remember when we copied the routine to notepad? This ended up being the end, because the code following those two branches cannot be executed. Anyway, This code is similar to the single-byte copy loop. The difference is this copies a lot of bytes at once. Time to update the C routine. void Decomp(u8 *input, u8 *output) { u8 i=0,j=0,k,ctrl; ctrl=input[i++]; while (1) { if (ctrl == 0xFF) goto whatever; if (ctrl <= 0x80) { for (k = 0; k < ctrl; k++) output[j++] = input[i]; } else { for (k = 0; k < (ctrl&0x7F); k++) output[j++] = input[i++]; } whatever: } } Alright, cool. Looks like all that's left is figuring out what's up with control byte 0xFF. And because our C code just loops forever, I have a feeling control byte 0xFF is the one which signifies the end of the data. But, instead of stepping through thousands of instructions, waiting until 0xFF is loaded, let's place a breakpoint on the address which is branched to when 0xFF is loaded. So, BPX on $C27E, in CPU mem, of course. Hit Run and check out the assembly. $C27E:A5 68 LDA $68 = #$F2 $C280:18 CLC $C281:69 03 ADC #$03 $C283:85 68 STA $68 = #$F2 $C285:A5 69 LDA $69 = #$C2 $C287:69 00 ADC #$00 $C289:85 69 STA $69 = #$C2 $C28B:4C 2A C2 JMP $C22A I don't know the importance of $0068 or $0069, so let's continue to that jump. $C22A:A0 00 LDY #$00 $C22C:B1 68 LDA ($68),Y @ $C2F5 = #$04 $C22E:C9 FF CMP #$FF $C230:D0 01 BNE $C233 $C232:60 RTS Well, this is interesting. $0068 is another pointer. This assembly loads from the pointer, and if the value is $FF, it exits! Otherwise the decompression continues, starting all over, with a new data pointer. The data at $C2F5 contains the bank number to swap to (see $C233 above) followed by the pointer to compressed data. Compressed data contains a target pointer, the address where the data gets decompressed to, followed by the actual compressed data. We now have enough information to complete the decompression routine, and even some to start a level editor. However, that goes far beyond the scope of this text. I will leave you with the final, optimized decompress routine in C. void Decomp(u8 *input, u8 *output) { u8 i=2,j=0,k,ctrl; while ((ctrl=input[i++]) != 0xFF) { if (ctrl <= 0x80) for (k = 0; k < ctrl; k++) output[j++] = input[i]; else for (k = 0; k < (ctrl&0x7F); k++) output[j++] = input[i++]; } } -===========--- 5) Final Thoughts -==============--- Now that you have a decompression routine written in C, a decompressor program will be very easy to build. Note that in Decomp(), I've set i to initiallize as 2, skipping the two bytes that make up the target pointer. This way, the program can load pointers to the compressed data from the data containing bank numbers, etc. Overall, this has been one of the easiest compression formats I've seen, so this is a very good one to tackle in a tutorial. And remember that the more simplistic the format, the worse it compresses. More complex formats usually have much better ratios. After your decompression program is complete, you'll want a program to recompress the data. This part of the process always takes the longest. The idea behind it is simple: reverse the decompression routine to make it into a compressor. (Check out my FCEUd Tutorial #2 for an example of reversing the operations of a routine) In my experience, making the compressor isn't too complicated. But, getting your recompressed data to fit in the allocated space really is a challenge. Some times, you'll have to change pointers, and move data to get it all to fit. That's why the recompression process is always a pain in the ass. Sometimes I felt it would be best to rewrite the decompression routine in the game so the edited data will actually fit, if the routine is better. Well, good luck with that part, you'll probably need it. -===========--- 6) Legal Information -===============--- This document is Copyright 2003 Parasyte\Dragon Eye Studios. This doument may not be modified in any way, in part or whole without permission of the author. Parasyte and Dragon Eye Studios are in no way affiliated with Nintendo. Addams Family et.al and Nintendo are registered trademarks of thier respected owners. Parasyte and Dragon Eye Studios are not responsible for how you use the information contained in this document. We are not responsible for the fact that it may use your toilet and forget to flush. Don't chew with your mouth open! EOF